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ABSTRACT 


Expressed  desires  of  lecturers  and  members  at  the  NATO  Advanced  Study  Institute, 
July  1965,  led  the  author  to  prepare  guidelines  for  using  statisticians  in 
retrieval  system  evaluation  studies.  These  guidelines  describe  questions  by 
statisticians  far  information  concerning  the  retrieval  system  environment. 
Specific  statistician  tasks  are  identified.  The  user  support  requirements 
are  outlined.  Constraints  in  the  utilization  of  a  statistician  are  discussed. 
The  paper  was  conceived,  reproduced,  written,  and  distributed  among  the 
institute 1 s  two-week  session. 
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glossary 


1.  Criterion  Measure  B  A  scale  which  represents  an  acceptable 

(to  management )  standard  against  which 
all  other  measures  of  system  performance 
are  evaluated  by  analytical  techniques. 

For  example,  document  value  to  the  user  is 
an  underlying  concept  for  a  criterion 
measure . 


2.  Analysis  Units  ■  Data  points.  That  is,  a  sample  number  with 

a  combination  of  characteristics  which  defines 
a  manipulatable  set  of  information. 

3.  Objectivity  »  The  quality  or  state  of  being  verifiable  by 

scientific  methods. 


4.  Reliability 

5.  Validity 

6.  Subjective  Data 

7.  Parametric  Statistics 


The  quality  or  state  of  being  consistent 
upon  repeated  measurement  or  observation. 

The  quality  or  state  of  being  correlated 
or  in  harmony  with  the  criterion. 

Observations  of  system  behavior  phenomena 
which  involve  human  perception,  including 
opinions,  ratings,  etc. 

A  set  of  statistical  procedures  which  specifies 
rigid  requirements  for  measurement  and 
sampling,  e.g.  interval  scales,  large  samples, 
etc. 


8.  Degrees  of  Freedom  =  The  number  of  observations  which  are  free  to 

vary  after  certain  restrictions  have  been 
placed  on  the  data  by  the  design  of  the 
analysis  or  choice  of  statistical  tool. 

9.  Cross-validation  a  The  verification  of  the  results  of  an  initial 

evaluation  by  a  second  analysis  of  data  with 
a  different  sample  using  the  same  variables. 

10.  Frame  of  Reference  *  A  systematic  set  of  principles,  rules,  or 

presuppositions  or  a  system  of  laws,  mores, 
or  values,  or  an  interlocking  group  of  facts 
or  ideas  serving  to  orient  or  give  particular 
meaning  (as  to  fact,  statement,  or  point  of 
view),  or  serving  as  a  matrix  for  behavior 
or  the  formation  of  attitudes 


12  July  1966 


4 


SP-2556/OOO/OO 


\ 


TABLE  OF  CONTENTS 

Paragraph  Bage 

Glossary  - - - - - - - - -• - .......  3 

Abstract  ---- — - - - - - - - - - — — ........  1 

1.0  INTRODUCTION  ----- . - . —  5 

2.0  QUESTIONS  ASnED  BY  THE  STATISTICIAN . - . — -  5 

2.1  What  Is  the  Problem?  - — — — — ---- — ------- — ----- — -- — — -  5 

2.2  What  Data  Is  Available  or  Can  Be  Obtained?  — — - —  5 

2.3  What  Is  the  Criterion?  - - - - — — — -— - - — — — -----  5 

2.4  What  Resources  Are  Available  for  Data  Gathering?  — — -----------  5 

2.5  What  Resources  Are  Available  for  Data  Analysis?  - — - — — —  6 

2.6  What  Subjects  Are  Available  (if  required)?  - — - — — - - — — —  6 

2.7  How  Much  Calendar  Time  Is  Available  for  Data  Gathering 

and  Analysis?  - 7 

2.8  What  Results  of  Prior  Studies  on  the  Problem  or  Related 

Problems  Are  Available?  - — —  7 

3.0  WHAT  THE  STATISTICIAN  WILL  DO . 7 

3.1  Identify  Basic  Assumptions  — - ......  7 

3.2  Define  the  Variables  and  Analysis  Units  - - —  7 

3.3  Determine  Analysis  Strategy  - - —  7 

3.4  Provide  Logical  Basis  for  Interpreting  the  Results 

Including  Generalizing  Limitations  - - - - - - — —  9 


3.5  Determine  the  Degree  of  Definitiveness  Obtainable  ---- 
3*6  Relate  Results  to  Prior  Study  Results - - - - — 

3.7  Author  or  Co-Author  Final  Report (s)-- - - - - 

3.8  Estimate  Costs  - - — - - - - - — 

4.0  FROM  THE  USER  SUPPORT  FOR  THE  STATISTICIAN  IS  NEEDED  IN  THE 


FOLLOWING:  . - . . 10 

5.0  CONSTRAINTS  IN  THE  UTILIZATION  OF  A  STATISTICIAN  - . . 10 

5.1  Unfamiliarity  with  the  System  - - - - ...  10 

5.2  Short  Supply  of  Adequate  Statisticians  -  H 

5.3  Order  of  Desirability  in  Statistician  Types  - - —  n 

5*4  Contractor  or  Staff  Member - — - n 


* 


f 


22  July  1966 


5 


SP-2556/OOO/CO 


1.0  Introduction 


4  V 


The  need  for  statistical  support  in  designing  and  executing  evaluation 
studies  became  more  clear  as  the  NATO  Advanced  Study  Institute  conducted 
its  proceedings.  Several  lecturers  and  members  expressed  a  desire  for 
help  in  using  this  kind  of  support. 

Informal  discussions  revealed  an  interest  in  having  guidelines  for  the 
utilization  of  statisticians  as  a  part  of  the  report  on  the  proceedings 
of  this  Institute.  In  response  to  this  need  the  author  formulated  this 
preliminary  draft  as  proposed  guidelines. 

2.0  Questions  Asked  by  the  Statistician 

The  statistician  vill  need  to  have  answers  to  certain  kinds  of  questions 
before  he  can  begin  to  perform  his  job.  These  questions  are  based  on  the 
information  which  only  the  system-user  knows.  Typical  questions  are  as 
follows : 

2.1  What  Is  the  Problem? 

The  main  information  that  the  statistician  needs  to  know  is  a  clear 
description  of  the  problem.  This  should  include  the  objectives, 
the  conditions,  the  requirements  and  the  constraints  in  a  summary 
of  the  system,  which  should  cover  all  these  aspects  of  the  problem. 
Later,  questions  are  described  in  detail.  A  clear  understanding 
of  the  problem  between  the  user  and  the  statistician  will  have  an 
important  bearing  on  the  effectiveness  of  the  statistician's 
performance . 

2.2  What  Data  Is  Available  or  Can  Be  Obtained? 

The  first  specific  information  the  statistician  needs  to  know  is 
an  objective  description  of  the  kind  of  data  to  be  used  in  the 
analysis.  At  this  point  it  is  not  necessary  to  define  the 
variables.  That  is  the  statistician's  Job  at  a  later  stage  of  the 
design  development. 

2*3  What  Is  the  Criterion? 

The  moat  important  variable  is  the  criterion  measure,  so  early  in 
the  discussions  between  the  user  and  the  statistician  the  logic  of 
the  criterion  measure  must  be  determined. 

2.4  What  Resources  Are  Available  for  Data  Gathering? 

The  design  of  the  study  must  take  into  account  the  resources  which 
are  available  for  data  gathering.  Will  the  statistician  acquire 
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data?  Will  the  user  supply  all  the  data?  What  communication  media 
are  required  for  gathering  data?  Are  data  already  available?  How 
much  additional  data  are  required?  These  are  questions  which  must 
be  answered. 

2 . 5  What  Resources  Are  Available  for  Data  Analysis? 

Hie  analysis  of  data  usually  requires  three  categories  of  resources; 
they  are:  clerical,  professional  and  machine  or  equipment. 

2.5*1  Clerical 

For  clerical  assistance  three  types  are  usually  needed: 
l)  data  recording  2)  editing  capabilities  and  3)  statistical 
capabilities. 

2.5.2  Professional 

If  the  data  are  complex  and  require  a  comprehensive  knowledge 
for  certain  steps  in  processing,  professional  capabilities 
are  required.  Usually  one  or  more  subject  matter  specialists 
is  necessary  for  this  part  of  data  preparation  and  analysis. 

In  addition,  if  complex  computer  programming  is  required,  a 
professional  programmer  is  necessary.  Even  though  computer 
programs  are  available,  the  services,  on  a  part-time  basis, 
of  a  professional  programmer  will  enhance  the  quality  of  the 
study . 

2.5*3  Machine  or  Equipment 

The  volume  of  data,  the  complexity  of  the  design,  and  the 
number  of  steps  and  processes  are  important  factors  which 
determine  machine  or  equipment  requirements.  Hie  require¬ 
ments  may  vary  from  the  use  of  a  desk-calculator  to  complex 
electrical  accounting  machines  or  possibly  computers  or 
specialised  equipment. 

The  machine  requirements  can  represent  a  considerable  financial 
cost,  but  at  the  same  time  bring  about  a  great  saving  in 
time  and  increase  the  meaningfulness  of  the  output.  Machine 
requirements  are  directly  proportional  to  the  number  of 
variables,  size  of  sample  data,  and  the  complexity  of 
analytical  techniques. 

2.6  What  Subjects  Are  Available  (if  required)? 

If  Judgments  or  ratings  are  to  be  required  by  professional  subjects 
or  other  types  of  subjects  in  developing  criterion  and/or  reference 
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measures,  an  early  assessment  of  these  requirements  is  critical. 
Plans  should  be  initiated  early  to  arrange  for  such  requirements 
in  subjects. 

2 . 7  How  Much  Calendar  Time  Is  Available  for  Data  Gathering  and  Analysis? 

As  prior  questions  are  answered  the  amount  of  calendar  time  to 
accomplish  the  requirements  will  tend  to  be  larger  than  the  user 
anticipated.  A  review  of  all  requirements  with  reference  to 
the  possible  available  calendar  time  will  cause  a  revision  in 
objectives  and  other  study  design  conditions. 

2 . 8  What  Results  of  Prior  Studies  on  the  Problem  or  Related  Problems 
Are  Available? 

Before  the  final  design  is  established  it  is  important  that  the 
statistician  have  all  available  results  of  prior  studies  or 
activities  which  are  related  to  the  problem. 

3.0  What  the  Statistician  Will  Do 

After  the  statistician  has  had  an  opportunity  to  study  and  evaluate 
discussions  with  the  user  on  the  questions  described  in  the  previous 
section  he  will  be  prepared  to  perform  his  assignment. 

3.1  Identify  Basic  Assumptions 

On  the  basis  of  prior  studies  and  insight  into  the  nature  of  the 
problem,  basic  assumptions  are  identified  by  the  statistician. 

3*2  Define  the  Variables  and  Analysis  Units 

On  the  basis  of  the  basic  assumptions  the  variables  and  analysis 
units  are  to  be  defined  by  the  statistician. 

3*3  Determine  Analysis  Strategy 

At  this  point  the  crucial  parts  of  the  study  design  are  formulated 
by  the  statistician.  These  include  objectivity  and  reliability 
controls,  statistical  tool  requirements,  degrees  of  freedom 
requirements,  estimates  of  expected  validity  and  reliability  levels, 
and  any  special  features  of  the  design  which  are  implied  in  the 
problem  and  objectives. 

A.  Determine  Objectivity  find  Reliability  Controls 

At  this  point  the  sampling  design  cones  under  consideration  in 
addition  to  the  methods  for  the  gathering  of  subjective  data. 
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If  a  larger  part  of  the  data  are  subjective  in  nature,  a  large 
number  of  control  variables  are  necessary  for  properly  assess¬ 
ing  the  object ivity  of  the  data.  The  validity  of  the  results 
requires  testing,  and  such  tests  should  be  provided  for  in  the 
sampling  design.  For  example  fortuitous  effects  in  the  data 
may  show  favorable  results  in  a  single  sample.  Such  effects 
may  not  exist  in  another  random  sample.  Consequently,  con¬ 
clusions  based  on  a  single  sample,  are  risky.  It  may  be 
necessary  to  add  additional  control  variables  at  this  point. 

B.  Determine  Statistical  Tool  Requirements  (Parametric  or  Non- 
ftirametric, . 

Literature  on  statistics  contain  a  large  number  of  useful  tools 
for  analysis.  The  law  of  parsimony  should  be  applied  to  the 
selection  of  statistical  tools.  Sophisticated  tools  should 
never  be  used  merely  because  they  are  available  or  because  a 
statistician  knows  how  to  u3e  them.  In  some  cases  specialized 
techniques  or  new  applications  of  existing  tools  may  be 
required.  In  rare  cases  novel  techniques  or  tools  will  be 
required.  The  development  of  novel  techniques  is  usually  very 
costly. 

C.  Determine  the  Degrees  of  Freedom  Requirements 

The  number  of  degrees  of  freedom  which  are  required  are  de¬ 
termined  by  a  combination  of  factors,  such  as  the  size  of  the 
population  or  universe,  the  complexity  of  the  data  in  terms  of 
numbers  of  variables  or  data  characteristics,  the  degree  of 
subjectivity  in  the  data  and  the  complexity  of  the  analysis 
design.  Hie  number  of  degrees  of  freedom  is  also  known  as  the 
sample  size  or  the  number  of  cases  required  for  the  analysis. 

At  this  point  the  objectivity,  reliability  and  validity 
controls  should  be  taken  into  account.  Provision  should  be 
made  for  adequate  cross-validation  of  results. 

D.  Estimate  the  Validity  Level 

At  this  point  a  statistician  should  have  sufficient  knowledge 
of  the  problem,  design,  objectives,  and  the  data  to  make  an 
estimate  of  the  level  of  validity  which  cam  be  obtained.  In 
cases  where  a  problem  requires  a  pilot  study  design  this  re¬ 
quirement  will  be  too  difficult. 

F  Estimate  the  Reliability  Level 

In  the  case  of  reliability  the  experienced  statistician  should 
be  able  to  make  a  reasonable  estimate  of  the  expected  reliability 
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for  the  criterion  measures  and  other  variables. 

3.4  Provide  Logical  Basis  for  Interpreting  the  Results  Including 
Generalizing  Limitations. 

On  the  basis  of  the  review  of  prior  results  and/or  agreement 
with  the  user  of  the  system  the  statistician  formulates  a  frame 
of  reference  for  interpreting  the  data.  He  should  also  evaluate 
the  context  of  the  problem  and  assess  the  extent  to  which  the 
expected  results  can  be  applied  or  generalized  to  related  problems. 

3. 5  Determine  the  Degree  of  Definitiveness  Obtainable 

The  statistician  will  probably  be  overcautious  in  claiming  that 
definitive  results  will  be  obtained.  However,  the  user  will  want 
to  know  how  much  dependence  he  can  place  in  the  expected  results 
in  order  to  Justify  the  costs.  Therefore  the  statistician  is  re¬ 
quired  to  "stick  his  neck  out". 

3.6  Relate  Results  to  Prior  Study  Results 

As  a  part  of  the  design  for  this  particular  study  the  statistician 
should  select  tabular  and  other  statistical  material  which  may  be 
incorporated  in  the  report  for  the  study  in  question.  These  data 
from  prior  reports  should  also  be  considered  under  item  D  above. 

3.7  Author  or  Co-Author  Final  Report(s) 

Agreement  should  be  reached  between  the  user  of  the  system,  includ¬ 
ing  representative  professional  personnel,  and  the  statistician 
concerning  responsibility  for  preparing  final,  reports  and  author 
credits. 

3.8  Estimate  Costs 


A  final  but  not  least  important  consideration  is  the  matter  of 
financial  and  other  costs.  These  can  be  usually  defined  as  calendar 
time  requirements,  personnel  requirements  and  machine  or  equipment 
requirements. 

A.  Temporal  Requirement  a 

After  all  the  previous  steps  have  been  accomplished  a  realistic 
date  for  completion  of  the  project  should  be  established 
through  negotiation  between  the  user  and  the  statistician. 

Ibis  date  should  allow  for  "Slippage". 
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B.  Tersonnel  Requirements 

A  fined  assessment  of  personnel  for  data  gathering,  analysis, 
tray-.,  and  report  writing  ia  to  be  made. 

C.  Equipment  Requirements 

L&st  tut  not  least,  in  terms  of  financial  costs,  is  the  cost 
of  equipment  .quirements  for  the  analysis  of  data. 

D.  Overhead  Requirements 

In  the  process  of  executing  of  the  study  unexpected  costs 
always  arise,  consequently  provision  should  be  made  for  enough 
overhead  or  fixed  fees  which  will  insure  support  for  the 
final  completion  of  the  project. 

^ * 0  From  the  U3er  Support  for  the  Statistician  is  Needed  in  the  Following: 

A.  Identifying  Basic  Assumptions. 

B.  Defining  Variables. 

C.  rranging  for  Subjects  (if  required). 

D.  Arranging  for  Data  Gathering. 

E.  Arranging  for  Data  Analysis: 

1 .  Clerical 

2.  Professional 

3*  Machine  or  Equipment 

F.  Implementing  Data  Analysis. 

5*0  Constraints  In  the  Utilization  of  a  Statistician 

Although  the  need  for  a  statistician  is  very  important  the  user  of  a 
system  should  recognize  the  constraints  which  effect  the  use  of  this 
kind  of  professional  help  in  evaluation  studies.  Among  these  constraints 
are  the  general  unfemiliarity  of  a  new  statistician  with  the  user's 
system,  the  short  supply  of  adequate  statisticians  and  especially  the 
difficulty  of  obtaining  the  most  desirable  type. 

5*1  Unfamiliar ity  with  the  System 

System  environment,  personnel,  conventions  and  type  of  information 
involved  represent  a  complex  which  can  only  be  adequately  compre¬ 
hended  through  long  experience. 
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5.2  Short  Supply  of  Adequate  Statisticians 

Statisticians  are  in  great  demand  throughout  the  entire  realm  of 
all  professional  activities.  The  nature  of  the  work  is  difficult 
and  adequate  competence  is  not  easy  to  obtain.  Therefore  other 
types  of  professionals  are  usually  easier  to  find  than  statisticians. 
In  addition  the  cost  of  statisticians  is  high,  because  of  their 
market  value  as  a  function  of  the  law  of  supply  and  demand. 

5 * 3  Order  of  Desirability  in  Statistician  Types 

1.  Psychcmetrician  at  the  Fh.D.  or  M.A.  level  with  three  or  more 
years  in  research  design  and  analysis  experience  and  three 

or  more  years  of  data  processing  experience  Including  clerical, 
SAM,  and  digital  computer. 

2.  Econometrician,  sociometrician  or  biometriclan  with  the  same 
qualifications  as  above. 

3*  Statistician  in  any  other  subject  matter  field  with  the  same 
qualifications  as  above. 

Contractor  or  Staff  Member 


The  user  should  carefully  consider  the  three  possible  alternatives 
in  providing  for  statistical  design  support;  i.e.  contractor,  full 
time  employee,  or  part-time  employee.  If  the  project  is  very 
large  and  relatively  short  in  duration,  a  contractor  may  be  appro¬ 
priate.  If  the  problem  is  reasonably  large  and  continuous,  a  full 
time  employee  may  be  justified. 
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outlined.  Constraints  in  the  utilisation  of  a  statistician  era  discussed. 
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